Cross Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions
نویسندگان
چکیده
Vandalism is a major issue on Wikipedia, accounting for about 2% (350,000+) of edits in the first 5 months of 2012. The majority of vandalism are caused by humans, who can leave traces of their malicious behaviour through access and edit logs. We propose detecting vandalism using a range of classifiers in a monolingual setting, and evaluated their performance when using them across languages on two data sets: the relatively unexplored hourly count of views of each Wikipedia article, and the commonly used edit history of articles. Within the same language (English and German), these classifiers achieve up to 87% precision, 87% recall, and F1-score of 87%. Applying these classifiers across languages achieve similarly high results of up to 83% precision, recall, and F1-score. These results show characteristic vandal traits can be learned from view and edit patterns, and models built in one language can be applied to other languages.
منابع مشابه
Detecting Vandalism on Wikipedia across Multiple Languages
Vandalism, the malicious modification or editing of articles, is a serious problem for free and open access online encyclopedias such as Wikipedia. Over the 13 year lifetime of Wikipedia, editors have identified and repaired vandalism in 1.6% of more than 500 million revisions of over 9 million English articles, but smaller manually inspected sets of revisions for research show vandalism may ap...
متن کاملDetecting Wikipedia Vandalism using WikiTrust
WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by several high-r...
متن کاملWHAD: Wikipedia historical attributes data - Historical structured data extraction and vandalism detection from the Wikipedia edit history
This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions. By mining (attribute, value) pairs from the revision history of the English Wikipedia we are able to collect a comprehensive knowledge base that contains data on how attributes change over time. When dealing with the Wikipedia edit history, vandalic and erroneous edits are ...
متن کاملDetecting Wikipedia Vandalism using WikiTrust - Lab Report for PAN at CLEF 2010
WikiTrust is a reputation system for Wikipedia authors and content. WikiTrust computes three main quantities: edit quality, author reputation, and content reputation. The edit quality measures how well each edit, that is, each change introduced in a revision, is preserved in subsequent revisions. Authors who perform good quality edits gain reputation, and text which is revised by several high-r...
متن کاملWikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features
Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013